Language Transfer Hypotheses with Linear SVM Weights

نویسندگان

  • Shervin Malmasi
  • Mark Dras
چکیده

Language transfer, the characteristic second language usage patterns caused by native language interference, is investigated by Second Language Acquisition (SLA) researchers seeking to find overused and underused linguistic features. In this paper we develop and present a methodology for deriving ranked lists of such features. Using very large learner data, we show our method’s ability to find relevant candidates using sophisticated linguistic features. To illustrate its applicability to SLA research, we formulate plausible language transfer hypotheses supported by current evidence. This is the first work to extend Native Language Identification to a broader linguistic interpretation of learner data and address the automatic extraction of underused features on a per-native language basis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-Linguistic Transfer or Target Language Proficiency: Writing Performance of Trilinguals vs. Bilinguals in Relation to the Interdependence Hypothesis

This study explored the nature of transfer among bilingual vs. trilinguals with varying levels of competence in English and their previous languages. The hypotheses were tested in writing tasks designed for 75 high (N= 35) vs. intermediate (N=40) proficient EFL learners with Turkish, Persian, English and Persian, English linguistic backgrounds. Qualitative data were also collected through some ...

متن کامل

Identifying Sexual Predators by SVM Classification with Lexical and Behavioral Features

We identify sexual predators in a large corpus of web chats using SVM classification with a bag-of-words model over unigrams and bigrams. We find this simple lexical approach to be quite effective with an F1 score of 0.77 over a 0.003 baseline. By also encoding the language used by an author’s partners and some small heuristics, we boost performance to an F1 score of 0.83. We identify the most ...

متن کامل

Multiple Random Subset-Kernel Learning

In this paper, the multiple random subset-kernel learning (MRSKL) algorithm is proposed. In MRSKL, a subset of training samples is randomly selected for each kernel with randomly set parameters, and the kernels with optimal weights are combined for classification. A linear support vector machine (SVM) is adopted to determine the optimal kernel weights; therefore, MRSKL is based on a hierarchica...

متن کامل

Polynomial to Linear: Efficient Classification with Conjunctive Features

This paper proposes a method that speeds up a classifier trained with many conjunctive features: combinations of (primitive) features. The key idea is to precompute as partial results the weights of primitive feature vectors that appear frequently in the target NLP task. A trie compactly stores the primitive feature vectors with their weights, and it enables the classifier to find for a given f...

متن کامل

Improved Natural Language Learning via Variance-Regularization Support Vector Machines

We present a simple technique for learning better SVMs using fewer training examples. Rather than using the standard SVM regularization, we regularize toward low weight-variance. Our new SVM objective remains a convex quadratic function of the weights, and is therefore computationally no harder to optimize than a standard SVM. Variance regularization is shown to enable dramatic improvements in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014